[GPU] Optimize merge memory usage #136411

ldematte · 2025-10-10T15:14:14Z

This PR changes how we gather and compact vector data for transmitting them to the GPU. Instead of using a temporary file to write out the compacted arrays, we use directly the vector values from the scorer supplier, which are backed by a memory mapped input. This way we avoid an additional copy of the data.

elasticsearchmachine · 2025-10-10T15:14:39Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

libs/simdvec/src/main/java/org/elasticsearch/simdvec/QuantizedByteVectorValuesAccess.java

mayya-sharipova · 2025-10-12T21:15:09Z

distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/SystemJvmOptions.java

                "-Dio.netty.noUnsafe=true",
                "-Dio.netty.noKeySetOptimization=true",
                "-Dio.netty.recycler.maxCapacityPerThread=0",
+                // temporary until we get access to raw vectors in a future Lucene version


Is there an open Lucene issue or PR for that?

Not yet; depending on how #136416 goes, and the opinion of people more expert in Lucene (you, Chris, Ben), I'd like to generalize what we did there and raise a Lucene issue to have it.

FYI, I honestly don't see why Lucene would ever expose this information. It expands an API for no good purpose within Lucene.

Maybe not this API as-is, but I'd think there is value in having the ability to access back in a convenient and efficient way what has been written so far; it avoids having to write the same data more than once, or keep copies on memory, when we need the original data (e.g. raw vectors) to add "something" on top of it (e.g. quantized vectors, graph, etc.).
(But maybe I'm naive)

mayya-sharipova · 2025-10-12T21:16:02Z

...rc/main/java/org/elasticsearch/index/codec/vectors/reflect/VectorsFormatReflectionUtils.java

+import java.lang.invoke.MethodHandles;
+import java.lang.invoke.VarHandle;
+
+public class VectorsFormatReflectionUtils {


Wow, very nice organization! +1 for using VarHandle for reflection

mayya-sharipova

@ldematte Great work, I have not tested it yet, but amazing work how you organized it. My main comment: do you think we can simplify this PR by breaking into two separate ones: making this PR only about changes to merges, and doing changes for flush, ResourcesHolder, 128Mb in a separate PR? Or these changes are tightly coupled?

...rc/main/java/org/elasticsearch/index/codec/vectors/reflect/VectorsFormatReflectionUtils.java

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java

ldematte · 2025-10-13T06:20:59Z

doing changes for flush, ResourcesHolder, 128Mb in a separate PR?

I can do that: here is the PR #136464

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java

…space

mayya-sharipova · 2025-10-14T21:13:13Z

@ldematte Great changes. I have done some benchmarking on my laptop with int8, and I see great recall but surprisingly no speedups as compared with main branch:

gist: 1_000_000 docs; 960 dims; euclidean metric

index_type	index_time (ms)	force_merge_time (ms)	QPS	single segment recall
gpu main	61422	69010	353	0.97
gpu PR	59035	67766	296	0.98

cohere-wikipedia_v2: 934_024 docs; 768 dims; cosine metric

index_type	index_time (ms)	force_merge_time (ms)	QPS	single segment recall
gpu main	48164	47657	384	0.99
gpu PR	47824	47354	393	0.99

mayya-sharipova · 2025-10-14T21:14:58Z

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java

+                    // problems with strides; the explicit copy removes the stride while copying.
+                    // Note that this is _not_ an additional copy: input data needs to be moved to GPU memory anyway,
+                    // we are just doing it explicitly instead of relying on CagraIndex#build to do it.
+                    var deviceDataSet = dataset.toDevice(resourcesHolder.resources())


Nice workaround, so you also confirmed that strides don't work properly with Cagra index implementation?

mayya-sharipova

Great work, @ldematte

ldematte added 2 commits October 10, 2025 14:28

Use the internal raw vector data during merge, avoid additional tmp file

c56c707

Fix access

0819dbd

ldematte requested a review from a team as a code owner October 10, 2025 15:14

ldematte added >non-issue auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search test-gpu Run tests using a GPU v9.2.1 v9.3.0 labels Oct 10, 2025

ldematte requested review from ChrisHegarty and mayya-sharipova October 10, 2025 15:14

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 10, 2025

ldematte commented Oct 10, 2025

View reviewed changes

libs/simdvec/src/main/java/org/elasticsearch/simdvec/QuantizedByteVectorValuesAccess.java Show resolved Hide resolved

ldematte mentioned this pull request Oct 10, 2025

Expose vector values from Int7SQVectorScorerSupplier #136416

Merged

Expose vector values from Int7SQVectorScorerSupplier

48a3f7c

ldematte changed the title ~~[Gpu] Optimize merge memory usage~~ [GPU] Optimize merge memory usage Oct 10, 2025

Merge branch 'main' into gpu/optimize-merge-space

7cede4c

mayya-sharipova reviewed Oct 12, 2025

View reviewed changes

mayya-sharipova requested changes Oct 12, 2025

View reviewed changes

mayya-sharipova reviewed Oct 12, 2025

View reviewed changes

...rc/main/java/org/elasticsearch/index/codec/vectors/reflect/VectorsFormatReflectionUtils.java Outdated Show resolved Hide resolved

mayya-sharipova reviewed Oct 12, 2025

View reviewed changes

...rc/main/java/org/elasticsearch/index/codec/vectors/reflect/VectorsFormatReflectionUtils.java Show resolved Hide resolved

mayya-sharipova reviewed Oct 12, 2025

View reviewed changes

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java Outdated Show resolved Hide resolved

ldematte commented Oct 13, 2025

View reviewed changes

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java Show resolved Hide resolved

ldematte and others added 4 commits October 13, 2025 08:32

PR feedback

1fc9ff1

Merge remote-tracking branch 'upstream/main' into gpu/optimize-merge-…

b4bad58

…space

[CI] Auto commit changes from spotless

a904598

Merge remote-tracking branch 'upstream/main' into gpu/optimize-merge-…

3622f30

…space

ldematte added 2 commits October 13, 2025 18:11

Fix

de1f42a

Merge branch 'main' into gpu/optimize-merge-space

9da8566

ldematte requested a review from mayya-sharipova October 14, 2025 09:41

mayya-sharipova reviewed Oct 14, 2025

View reviewed changes

mayya-sharipova approved these changes Oct 14, 2025

View reviewed changes

[GPU] Optimize merge memory usage #136411

Are you sure you want to change the base?

[GPU] Optimize merge memory usage #136411

Conversation

ldematte commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 10, 2025

Uh oh!

Uh oh!

mayya-sharipova Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

ldematte Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ldematte Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldematte commented Oct 13, 2025

Uh oh!

Uh oh!

mayya-sharipova commented Oct 14, 2025

Uh oh!

mayya-sharipova Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ldematte commented Oct 10, 2025 •

edited

Loading

mayya-sharipova left a comment •

edited

Loading